Overview

Dataset statistics

Number of variables6
Number of observations243003
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.1 MiB
Average record size in memory48.0 B

Variable types

Numeric2
Text2
Categorical2

Reproduction

Analysis started2024-06-03 00:45:16.781874
Analysis finished2024-06-03 00:48:45.281106
Duration3 minutes and 28.5 seconds
Software versionydata-profiling v4.8.3
Download configurationconfig.json

Variables

movie_id
Real number (ℝ)

Distinct8884
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19916.397
Minimum1
Maximum193609
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2024-06-02T19:48:45.507252image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile163
Q11200
median2993
Q38880
95-th percentile99114
Maximum193609
Range193608
Interquartile range (IQR)7680

Descriptive statistics

Standard deviation34917.772
Coefficient of variation (CV)1.7532174
Kurtosis3.8101205
Mean19916.397
Median Absolute Deviation (MAD)2452
Skewness2.0760662
Sum4.8397441 × 109
Variance1.2192508 × 109
MonotonicityNot monotonic
2024-06-02T19:48:45.713161image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
356 1316
 
0.5%
296 1228
 
0.5%
1 1075
 
0.4%
79132 1001
 
0.4%
480 952
 
0.4%
380 890
 
0.4%
2959 872
 
0.4%
593 837
 
0.3%
780 808
 
0.3%
260 753
 
0.3%
Other values (8874) 233271
96.0%
ValueCountFrequency (%)
1 1075
0.4%
2 330
 
0.1%
3 104
 
< 0.1%
4 21
 
< 0.1%
5 49
 
< 0.1%
6 306
 
0.1%
7 108
 
< 0.1%
8 16
 
< 0.1%
9 16
 
< 0.1%
10 396
 
0.2%
ValueCountFrequency (%)
193609 1
 
< 0.1%
193587 2
< 0.1%
193585 1
 
< 0.1%
193583 3
< 0.1%
193573 1
 
< 0.1%
193571 2
< 0.1%
193567 2
< 0.1%
193565 4
< 0.1%
191005 4
< 0.1%
190221 1
 
< 0.1%

title
Text

Distinct8881
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2024-06-02T19:48:46.045110image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length158
Median length102
Mean length25.068608
Min length8

Characters and Unicode

Total characters6091747
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1135 ?
Unique (%)0.5%

Sample

1st rowToy Story (1995)
2nd rowToy Story (1995)
3rd rowToy Story (1995)
4th rowToy Story (1995)
5th rowToy Story (1995)
ValueCountFrequency (%)
the 90154
 
9.0%
of 26496
 
2.6%
1995 15779
 
1.6%
1994 13205
 
1.3%
1996 11040
 
1.1%
and 9837
 
1.0%
1997 9687
 
1.0%
1999 9643
 
1.0%
2000 8947
 
0.9%
2001 8903
 
0.9%
Other values (8960) 803271
79.8%
2024-06-02T19:48:46.593204image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
763997
 
12.5%
e 418818
 
6.9%
a 279011
 
4.6%
) 261135
 
4.3%
( 261135
 
4.3%
9 258428
 
4.2%
o 254364
 
4.2%
r 240613
 
3.9%
n 231076
 
3.8%
t 219672
 
3.6%
Other values (110) 2903498
47.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 6091747
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
763997
 
12.5%
e 418818
 
6.9%
a 279011
 
4.6%
) 261135
 
4.3%
( 261135
 
4.3%
9 258428
 
4.2%
o 254364
 
4.2%
r 240613
 
3.9%
n 231076
 
3.8%
t 219672
 
3.6%
Other values (110) 2903498
47.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 6091747
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
763997
 
12.5%
e 418818
 
6.9%
a 279011
 
4.6%
) 261135
 
4.3%
( 261135
 
4.3%
9 258428
 
4.2%
o 254364
 
4.2%
r 240613
 
3.9%
n 231076
 
3.8%
t 219672
 
3.6%
Other values (110) 2903498
47.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 6091747
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
763997
 
12.5%
e 418818
 
6.9%
a 279011
 
4.6%
) 261135
 
4.3%
( 261135
 
4.3%
9 258428
 
4.2%
o 254364
 
4.2%
r 240613
 
3.9%
n 231076
 
3.8%
t 219672
 
3.6%
Other values (110) 2903498
47.7%

genres
Categorical

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
Drama
39562 
Comedy
34357 
Action
28910 
Thriller
25166 
Adventure
20459 
Other values (15)
94549 

Length

Max length18
Median length11
Mean length6.4145422
Min length3

Characters and Unicode

Total characters1558753
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFantasy
2nd rowComedy
3rd rowChildren
4th rowAnimation
5th rowAdventure

Common Values

ValueCountFrequency (%)
Drama 39562
16.3%
Comedy 34357
14.1%
Action 28910
11.9%
Thriller 25166
10.4%
Adventure 20459
8.4%
Romance 16405
6.8%
Sci-Fi 16114
6.6%
Crime 15428
 
6.3%
Fantasy 9926
 
4.1%
Mystery 7495
 
3.1%
Other values (10) 29181
12.0%

Length

2024-06-02T19:48:46.793298image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 39562
16.3%
comedy 34357
14.1%
action 28910
11.9%
thriller 25166
10.4%
adventure 20459
8.4%
romance 16405
6.7%
sci-fi 16114
6.6%
crime 15428
 
6.3%
fantasy 9926
 
4.1%
mystery 7495
 
3.1%
Other values (12) 29239
12.0%

Most occurring characters

ValueCountFrequency (%)
r 167891
 
10.8%
e 149940
 
9.6%
a 126238
 
8.1%
i 116658
 
7.5%
m 110284
 
7.1%
o 98139
 
6.3%
n 89641
 
5.8%
t 72430
 
4.6%
c 64756
 
4.2%
d 60309
 
3.9%
Other values (24) 502467
32.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1558753
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 167891
 
10.8%
e 149940
 
9.6%
a 126238
 
8.1%
i 116658
 
7.5%
m 110284
 
7.1%
o 98139
 
6.3%
n 89641
 
5.8%
t 72430
 
4.6%
c 64756
 
4.2%
d 60309
 
3.9%
Other values (24) 502467
32.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1558753
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 167891
 
10.8%
e 149940
 
9.6%
a 126238
 
8.1%
i 116658
 
7.5%
m 110284
 
7.1%
o 98139
 
6.3%
n 89641
 
5.8%
t 72430
 
4.6%
c 64756
 
4.2%
d 60309
 
3.9%
Other values (24) 502467
32.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1558753
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 167891
 
10.8%
e 149940
 
9.6%
a 126238
 
8.1%
i 116658
 
7.5%
m 110284
 
7.1%
o 98139
 
6.3%
n 89641
 
5.8%
t 72430
 
4.6%
c 64756
 
4.2%
d 60309
 
3.9%
Other values (24) 502467
32.2%
Distinct3567
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
2024-06-02T19:48:47.105458image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length32
Median length24
Mean length13.11799
Min length3

Characters and Unicode

Total characters3187711
Distinct characters87
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique454 ?
Unique (%)0.2%

Sample

1st rowJohn Lasseter
2nd rowJohn Lasseter
3rd rowJohn Lasseter
4th rowJohn Lasseter
5th rowJohn Lasseter
ValueCountFrequency (%)
john 11057
 
2.2%
david 8243
 
1.6%
steven 7649
 
1.5%
robert 7533
 
1.5%
james 5991
 
1.2%
spielberg 5921
 
1.2%
peter 5324
 
1.1%
michael 5126
 
1.0%
richard 4817
 
1.0%
george 4325
 
0.9%
Other values (4210) 435383
86.8%
2024-06-02T19:48:47.619419image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 312830
 
9.8%
258366
 
8.1%
n 243777
 
7.6%
a 237911
 
7.5%
r 225611
 
7.1%
o 197679
 
6.2%
i 177977
 
5.6%
l 129614
 
4.1%
t 117414
 
3.7%
s 105499
 
3.3%
Other values (77) 1181033
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3187711
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 312830
 
9.8%
258366
 
8.1%
n 243777
 
7.6%
a 237911
 
7.5%
r 225611
 
7.1%
o 197679
 
6.2%
i 177977
 
5.6%
l 129614
 
4.1%
t 117414
 
3.7%
s 105499
 
3.3%
Other values (77) 1181033
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3187711
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 312830
 
9.8%
258366
 
8.1%
n 243777
 
7.6%
a 237911
 
7.5%
r 225611
 
7.1%
o 197679
 
6.2%
i 177977
 
5.6%
l 129614
 
4.1%
t 117414
 
3.7%
s 105499
 
3.3%
Other values (77) 1181033
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3187711
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 312830
 
9.8%
258366
 
8.1%
n 243777
 
7.6%
a 237911
 
7.5%
r 225611
 
7.1%
o 197679
 
6.2%
i 177977
 
5.6%
l 129614
 
4.1%
t 117414
 
3.7%
s 105499
 
3.3%
Other values (77) 1181033
37.0%

user_id
Real number (ℝ)

Distinct610
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean325.57233
Minimum1
Maximum610
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2024-06-02T19:48:47.814571image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile32
Q1177
median325
Q3477
95-th percentile600
Maximum610
Range609
Interquartile range (IQR)300

Descriptive statistics

Standard deviation182.34269
Coefficient of variation (CV)0.56006816
Kurtosis-1.1758802
Mean325.57233
Median Absolute Deviation (MAD)150
Skewness-0.074420324
Sum79115052
Variance33248.858
MonotonicityIncreasing
2024-06-02T19:48:48.002755image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
414 6082
 
2.5%
599 5659
 
2.3%
448 4313
 
1.8%
474 4294
 
1.8%
610 3326
 
1.4%
274 3317
 
1.4%
380 3316
 
1.4%
68 3080
 
1.3%
249 2697
 
1.1%
606 2468
 
1.0%
Other values (600) 204451
84.1%
ValueCountFrequency (%)
1 595
0.2%
2 75
 
< 0.1%
3 89
 
< 0.1%
4 444
0.2%
5 95
 
< 0.1%
6 673
0.3%
7 398
0.2%
8 123
 
0.1%
9 97
 
< 0.1%
10 324
0.1%
ValueCountFrequency (%)
610 3326
1.4%
609 89
 
< 0.1%
608 2025
0.8%
607 472
 
0.2%
606 2468
1.0%
605 638
 
0.3%
604 242
 
0.1%
603 2012
0.8%
602 328
 
0.1%
601 240
 
0.1%

rating
Categorical

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.9 MiB
4.0
64254 
3.0
48532 
3.5
31965 
5.0
31733 
4.5
20971 
Other values (5)
45548 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters729009
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4.0
2nd row4.0
3rd row4.0
4th row4.0
5th row4.0

Common Values

ValueCountFrequency (%)
4.0 64254
26.4%
3.0 48532
20.0%
3.5 31965
13.2%
5.0 31733
13.1%
4.5 20971
 
8.6%
2.0 18189
 
7.5%
2.5 13242
 
5.4%
1.0 6611
 
2.7%
1.5 4204
 
1.7%
0.5 3302
 
1.4%

Length

2024-06-02T19:48:48.172933image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-02T19:48:48.331913image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
4.0 64254
26.4%
3.0 48532
20.0%
3.5 31965
13.2%
5.0 31733
13.1%
4.5 20971
 
8.6%
2.0 18189
 
7.5%
2.5 13242
 
5.4%
1.0 6611
 
2.7%
1.5 4204
 
1.7%
0.5 3302
 
1.4%

Most occurring characters

ValueCountFrequency (%)
. 243003
33.3%
0 172621
23.7%
5 105417
14.5%
4 85225
 
11.7%
3 80497
 
11.0%
2 31431
 
4.3%
1 10815
 
1.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 729009
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
. 243003
33.3%
0 172621
23.7%
5 105417
14.5%
4 85225
 
11.7%
3 80497
 
11.0%
2 31431
 
4.3%
1 10815
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 729009
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
. 243003
33.3%
0 172621
23.7%
5 105417
14.5%
4 85225
 
11.7%
3 80497
 
11.0%
2 31431
 
4.3%
1 10815
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 729009
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
. 243003
33.3%
0 172621
23.7%
5 105417
14.5%
4 85225
 
11.7%
3 80497
 
11.0%
2 31431
 
4.3%
1 10815
 
1.5%

Interactions

2024-06-02T19:47:31.982694image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-06-02T19:45:22.536863image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-06-02T19:48:40.085524image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-06-02T19:46:58.320020image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

2024-06-02T19:48:44.585703image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-06-02T19:48:44.909879image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

movie_idtitlegenresdirector_nameuser_idrating
01Toy Story (1995)FantasyJohn Lasseter14.0
11Toy Story (1995)ComedyJohn Lasseter14.0
21Toy Story (1995)ChildrenJohn Lasseter14.0
31Toy Story (1995)AnimationJohn Lasseter14.0
41Toy Story (1995)AdventureJohn Lasseter14.0
53Grumpier Old Men (1995)RomanceHoward Deutch14.0
63Grumpier Old Men (1995)ComedyHoward Deutch14.0
76Heat (1995)ThrillerMichael Mann14.0
86Heat (1995)CrimeMichael Mann14.0
96Heat (1995)ActionMichael Mann14.0
movie_idtitlegenresdirector_nameuser_idrating
242993168248John Wick: Chapter Two (2017)ThrillerChad Stahelski6105.0
242994168248John Wick: Chapter Two (2017)CrimeChad Stahelski6105.0
242995168248John Wick: Chapter Two (2017)ActionChad Stahelski6105.0
242996168250Get Out (2017)HorrorJordan Peele6105.0
242997168252Logan (2017)Sci-FiJames Mangold6105.0
242998168252Logan (2017)ActionJames Mangold6105.0
242999170875The Fate of the Furious (2017)ThrillerF. Gary Gray6103.0
243000170875The Fate of the Furious (2017)DramaF. Gary Gray6103.0
243001170875The Fate of the Furious (2017)CrimeF. Gary Gray6103.0
243002170875The Fate of the Furious (2017)ActionF. Gary Gray6103.0